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Abstract. In conducting preliminary analysis during an epidemic, 
data on reported disease cases offer key information in guiding the 
direction to the in-depth analysis. Models for growth and trans- 
mission dynamics are heavily dependent on preliminary analysis 
results. When a particular disease case is reported more than once 
or alternatively is never reported or detected in the population, 
then in such a situation, there is a possibility of existence of mul- 
tiple reporting or under reporting in the population. In this work, 
a theoretical approach for studying reporting error in epidemiol- 
ogy is explored. The upper bound for the error that arises due to 
multiple reporting is higher than that which arises due to under 
reporting. Numerical examples are provided to support the argu- 
ments. This article mainly treats reporting error as deterministic 
and one can explore a stochastic model for the same. 
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1. Introduction 

Reporting is one of the crucial elements of epidemiological research. 
Its importance ranges from helping the base line assessment of the 
epidemic to understanding the rate of reproduction of infected individ- 
uals. For example, a simple equation of the form I{t) = /(O) exp(a.t) 
can be used to estimate a, the exponential growth rate between the 
reported infection numbers /(O) and I{t) at times and t [t > 0) re- 
spectively. When /(O) and I{t) suffer with reporting errors or when 
they lack accuracy, then the computed growth rate a is misleading. 
There are evidences that under reporting of the cases lead to under es- 
timation of incidence |p4|2j, delay in monitoring and surveillance ^di- 
There are studies which support better idea on the magnitude of the 
epidemic had there been no under reporting [5l [6]. Since under re- 
porting could mislead the impact of the epidemic, there were attempts 
to understand the extent of under reporting using various surveys and 
modeling [TJ El HI [10]. There are several deterministic and stochas- 
tic models available for computing the growth rates of epidemics, see 
[TT] [12]. There are certain methods which fail to predict epidemic 
growth accurately or fail to ascertain the past trends of the infections 
when reporting is incomplete. The method of back-calculation [13j 
for estimating HIV infection fails to construct HIV trends accurately 
when AIDS reporting is incomplete. Such methods are based on the 
fact that, after assessing the number of individuals with infection, the 
duration between infection times and disease times is used to project 
number of individuals with disease at some future time point. Here, 
instead of handling the cases discretely, convolution of infection density 
and density of duration between infection and disease times for relevant 
continuous random variables are considered. Future numbers of indi- 
viduals with disease already projected using back-calculation methods 
can be compared with number of reported disease cases at the same 
point to obtain reporting error of disease cases. By application of such 
methods, it is implicitly assumed that the populations are closed to 
migration during the study period or during the two time points where 
a reporting error of disease is estimated. Other popular methods for 
reporting error include, conducting surveys at two or more time points 
on a population which involve either testing of randomly selected blood 
samples for infection under study or assessing infected people through 
verbal autopsies and then comparing the estimated infection preva- 
lence in the same population with already existed reported infection 
numbers at the same time. In general, for simple or advanced models, 
if data suffers from under-reporting then usually the data is adjusted 
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before applying a given method. Reported incidence and prevalence 
are requirements for validating models and forecasting. Also, the pa- 
rameters derived from these reported incidence trends are shown to be 
consistent in model building and analysis [llj. 




Over media coverage of Swine Flu in some parts of the world led to 
over magnifying of the disease burden as these preliminary results were 
used in modeling epidemics in many countries during 2009 outbreak of 
novel HlNl influenza. It could have happened that in the 2009 swine 
flu outbreak, some studies disregarded the large number of cases that 
did not lead to any serious complications. Protocols and prepared- 
ness for future pandemic based on the experience of 2009 outbreak in 
Europe is well understood [II]. In a study on BSE (Bovine Spongi- 
form Encephalopathy) in France, it was found that some cases were 
not detected by the surveillance system, which caused under reporting 
of the epidemic jl5]. In this study, they reconstructed the past trends 
by back-calculation and adjusted the under reporting. Another study 
on BSE in Britain examined the under reporting of cases and differen- 
tial mortality using back-calculation by improving the standard back- 
calculation technique [i6\. Measles data analysis in Italy indicated that 
under reporting could be distorting observed epidemic patterns [T7]. A 
study [18] on HIV addresses that over reporting of individuals on an- 
tiretroviral therapy and related caution to be taken while estimating 
the number. Over reporting percentage was found to be important to 
ascertain actual epidemic levels in sexually transmitted infections in 
Amsterdam [19j. 

There are several ways of quantifying the reporting error depending 
upon the epidemic. These could be observing incidence curve obtained 
by models with reported incidence of a given epidemic, through sam- 
ple surveys, back-calculation methods etc, for example, see [3 HI [13] . 
In this paper, a supplementary way is proposed for understanding ef- 
ficiency of reporting using limit analysis. In this context, the terms 
'limit analysis' meant that the rate of increasing or decreasing of re- 
porting efficiencies are studied over a very long period of time, and 
also situations such as reported number of disease cases approaching 
to actual disease cases are studied while obtaining bounds of error. Nu- 
merical examples are also provided. Our method treats reported and 
actual disease cases as numbers on the real line and functions of error 
of reporting are proposed to quantify the bounds of error of reporting. 
We introduce theoretical arguments in different settings and illustrate 
them by numerical examples. The upper bound for this calculated er- 
ror that arises due to multiple reporting (excess reporting) is shown 
here to be lower than that of error due to under reporting. We also 
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analytically show that even if error of reporting is not observed, there 
is a possibility of multiple reporting in the data. Realistic data fitting 
is not in the scope of present work. The results indicate that there 
exists a serious consequence to multiple reporting (a situation arises 
when each case is reported more than once). 

2. Preliminaries 

Reporting of disease cases plays an important role in understanding 
epidemics. We provide two examples and two observations. 

Example 1. Consider a homogenous population of 800 individuals, 
where each individual has equal chance of acquiring an infection of type 
A virus. Suppose 7 individuals were reported of acquiring infection of 
type A by the health system in a year of 26 actual number of cases 
infected in the same year. Now, the prevalence of type A virus in 
this year is 7/800 = 0.00875, but actual prevalence after adjusting for 
under reporting is 26/800 = 0.0325. Percentage of reported out of 
actual cases in this situation is 26.92. 

Example 2. Let us now compute incidence rate of type B virus in 
a cohort study. Suppose a cohort of 750 individuals are followed for 
one year and during which 17 new cases were reported in the year to 
have acquired type B virus out of 48 actually acquired the virus in 
the same year. The incidence rate by assuming uniform distribution 
of infections over the year is 17/741.5 = 0.0229 person-years, where as 
actual incidence rate after adjusting for under reporting is 48/726 = 
0.0661 person-years. Note that each of the 17 reported cases were 
remained uninfected on an average of six months, hence 750 individuals 
were actually followed for 750-8.5=741.5 years without being infected 
in that year. By a similar explanation for 48 actual cases, we obtain 
726 person-years. In an ideal situation, well designed cohort studies 
consists at least information on number of individuals recruited for the 
study, duration of follow-up for each individual and number of newly 
infected cases of virus during the study period. Among other reasons, 
under reporting could arise also due to both infection and recovery 
from the virus between two follow-up periods and not detecting the 
virus at the time of the next follow-up, not reporting at the time of 
verbal autopsy conducted at next follow-up where clinical diagnosis for 
the presence of the virus were conducted etc. 

The under reporting or over reporting of cases leads to errors in 
assessing the epidemic spread through modehng. Total disease cases 
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(i.e. the number of actual cases) in the population could be taken 
as the reported number plus or minus the error of reporting. In the 
present work, it is attempted to study when efficiency in reporting error 
is considered as a difference between A/j (number of total cases at time 
h) and Qh (number of reported cases at time h). The three situations 
that arise are, i) Ah> ^Ih (due to under reporting), ii) Ah< flh (due 
to over reporting) and iii) Ah= flh{dne to accurate reporting or due to 
no reporting error, when there are no multiple reported cases among 
reported cases). 

Observation 1. We saw from the examples [T] and [21 that there is 
no error (or some may term it as no bias) in estimating incidence 
or prevalence when the ratio Qh/Ah attains the value 1. We define 
neighbourhood around actual cases Ah for some o" > be Bo-(A/i) = 
{6 G M : |6 — A/i| < cr} and define neighbourhood around 1 for some 
a; > be At^(l) = {a E M : \a — 1\ < u} . Then for every Aaj(l), there 
exists a Bo-(A/j) with the property that for all Qh G ^a{Ah), it fol- 
lows that Qh/Ah G A^(l). In the next section, we argue that (Qh) 
is bounded. By adopting results in j20j to the present epidemiology 
scenario, we can deduce that (Qh) is convergent when (Qh) is bounded 
(if we obtain the inequality 2Qh+2 ^ ^h+i + ^h)- Further, under a 
certain assumption, we see that (Qh) is convergent without above in- 
equality. The fact that the above type of inequality is not necessary 
for a bounded sequence to convergent was discussed with an example 
in [20]. 

Observation 2. Let be a random variable such that Xh G (0, 1). If 
Am = Ai A2 + A3 + ... Am and = fii + + ^3 + ••• + ^m, then, 
the following were observed |21j : 

i) Xm > UJm 

m 00 
h=l k=0 

Further, when fl follows Poisson mass function with parameter P 
and rate of decrease of x'f^s is c, it was observed [21] that 

/exp{-P}P^'^ Xoexp{-{P + c.h)}P^'^ exp{-(P + 2c./i)}P^'' 
is convergent. 
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Multiple reporting phenomena might also contribute in reduction of 
efficiency in reported cases. In this work, efficiency is not only measured 
as a difference of reported and total cases, but also impact of multiple 
reporting phenomena is studied. The results presented here are original 
and brings a new outlook to study epidemic behavior. 

3. Epidemic reporting effigiengy 

We denote, an for the difference between reported and actual cases 
at time h. If A/j is total cases, Vth is reported cases and au is error of 
reporting taken over the time h then symbolically, Kh = As at 

tends to zero, then ~^ for some h > N & N (section 3, \21\), is 
more than fl^ (in case of under reporting), A/j is less than (in case of 
multiple reporting) and is equal to (in case of no reporting error). 
There is some possibility that these under reported cases suffer from 
multiple reporting. For instance, let riih be the number of individuals 
out of Qh those are reported exactly once, so that VLh—rtih is the number 
those are reported more than once, then Vl^ = {^h ~ n^ih) + ^i^. This 
tells us, reported cases need not be of different individuals and could 
be sum of those individuals whose cases were reported more than once 
and those individuals whose cases reported only once. If none of the 
individuals were reported exactly once (a rare event may arise in case 
of complete uncertainty of health diagnostics, facilities), then all the 
reported cases are sum of multiple reporting cases. If we denote / 
for the efficiency of reporting and define it as the ratio of Qh and Ah, 
then / could vary over the time period depending upon the reporting 
system. If multiple reporting is present then, f{xh) = A^/^h and after 
adjusting for excess number due to multiple reporting, the resultant 
efficiency function will be, fi{xh) = Ah/riih, where Uih < ^h- Here 
/i > /. Similarly, ah = Ah - Qh or Qh - and = A^ - riih or 
riih — Ah, where riih < fi/i. If we assume ah is constant over time (say, 
a) then the difference between Ah and Qh is constant over time h. 
We begin with elementary case of epidemic efficiency as a difference 
between reported and total cases and then extend the case by varying 
efficiency. 

3.1. (Ah < i^h)- This is a situation which raises due to multiple re- 
porting of cases. The reasons responsible for this are when individuals 
go to several clinics or public medical setups to get diagnosis and each 
of these clinical or medical setup report to the national level epidemic 
surveillance. Individuals may prefer re-diagnosis either due to not hav- 
ing faith in one particular system where they were detected for a disease 
or it could be due to choice of reconfirmation of the diagnosis. Since, 
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q; > 0, we have fi/j — a > and Ah > W h E Z"*". Let us assume 
that the epidemic grows exponentially and becomes severe as the time 
progresses (which is usual in the beginning for many epidemics), then 
(Qh) can be taken as a monotonic increasing sequence. Let W be the 
whole population, then fl^ < CW Wh, where C G M"*" is due to multiple 
reporting. At any given point of time, (Qh) cannot be more than the 
finite multiples of the total population. This is because if the epidemic 
spreads to entire population and even if each case is reported multiple 
ways, still it will be a finite number, i.e CW is finite. Hence (flh) is 
bounded and convergent. Since a is finite then (A/^) is also convergent. 
We have A;^^ = (Q^ - q;)-^= Q;^^ {1 - (a/Qh)}'^- Prom the proper- 
ties of numbers, whenever a/Qh < 1, then we can bring the inequality 
1 — (a/Qh)"^ < cxp (a/Qh) (1 — (^/^h) < 1- This implies exp (a/Qh) < 
(n,i)exp(a/fi,) < " W^O}"' • Thus by 

simplifying we get a < In (fi/^/A/j) V/i. Let Q be the maximum for 
^Ih values and A be the maximum for Ah values, then f21n ^f2/A^ 

can be treated as an upper bound for a. Let (flh) be a monotonically 
non-increasing (and also epidemic does not grow exponentially), but 
always maintains the relation ^7/^ — « > 0, and follows a periodic max- 
imum value with period of H (say) time points. For this situation also 

nin ^fl/A^ is an upper bound for a. There is a possibility to have a 

smaller upper bound than this for a. Even if Qh values stop to behave 
like periodic maximum property and increase after some j > N E N, 
then a < In (il/i/A^) . When Ah ^ then Qh 0. Eventually, 
as Aft —> then irrespective of the error of the reporting is high or 
low, eventually disease cases will become zero, hence study of a is not 
considered important in this situation. Now, we begin with a trivial 
statement on total reported cases. 

Theorem 3. Let e > 0. If flh > Ah, flh is monotonically increasing 
function or monotonic non-increasing but — a > 0, then there exists 
a point in the sequence (A„) such that Ah G Bg(Jlft), where Be(Q/i) is 
e— neighbourhood of D,h- 

Proof. Let e > 0. We have seen in section 3.1 that a < Q/iln(Qft/Aft) 
when {ilh) is monotonically increasing as well (Qh) is not monotonic 
increasing but — « > 0. Therefore, \Ah — Qh\ <^h lii(^?i/Aft). When 
we choose A^ > Qh/ exp(e/r2ft) for some /i > N, then A^ G Be(Oft). □ 

3.2. (Ah > r2h)- This is a typical under reporting situation which could 
arise due to following consequences: incomplete diagnosis, incomplete 
reporting of the diagnosed cases and under detection of cases. Here 



SHORT TITLE: THEORY ON REPORTING 



8 



Ah = nh + a. We have a'^ = {(1/Ah)(l - ^h/Ah)} where Qh/Ah < 1- 
Therefore 1 - {Vlh/Ah)'^ < exp (fi/^/Aft) (1 - Vih/Ah) < 1. This im- 

phes {l/Ah)exp{Qh/^h) {1 - (fift/Ah,)}"^= a'^ . Therefore a < 

A/^exp {—Vth/ Ah) and A exp {^Vt/ A^ is an upper bound. Even though 

reported cases are less than that of actual, there is a possibility of 
multiple reporting among under reported cases. Admitting this fact 
further complicates the error associated with epidemic analysis. In the 
presence of such multiple reporting, under reporting observed is indeed 
more than that of we normally admit without taking 'multiple report- 
ing factor' {MRF). In other words, by neglecting MRF (when it is 
present in the data), the degree of reporting would be better, but it is 
indeed a false degree of reporting (Fig. 13. ip . Therefore, MRF within 
under reporting implies reporting is further lower than the total cases. 

Theorem 4. If Ah > ^Ih then there exists a point in the sequence (Ah) 
such that Ah G Be(fi/i), for every e > 0. 

Proof. Under the hypothesis, we have a < Ahexp (Qh/Ah) . There- 
fore, |A/i — < AhBxp {—Qh/Ah). Now, when we choose Qh > 
Ah{lnQh/ioge) for some h > 'N then Ah € B^{Qh)- Note that, Qh > 
Ah{\n VLh/loge) > Ah exp (-fi/,/A,,) □ 



3.2.1. Multiple reporting within Qh- Let Kh be a positive integer which 
is defined as number of classes at time h which can accommodate Qh- 
Suppose Qh is completely made up of Khisay) classes and each class 
consists finite number of (multiple) reporting of one individual. If every 
class consists of one member then Qh = Kh, a situation when multiple 
reporting among reported cases is avoided. On the other side if r]h{E 
Kh) classes are empty (i.e. no reported case in these classes), then this 
is compensated by more than one reported cases in one or more of the 
remaining [Kh—Tjh) classes {Nh, say) (see also Fig. 13. 2p . As rjh — )■ 0, the 
reported cases (under reported number) tends to represent true (actual) 
cases and are not affected by multiple reporting of individual cases. 
Expected error in the presence of under reporting a = Ah — {Kh — rjh). 
Even though rjh — )■ 0, we have to note that actual cases suffer under 
reporting. We can observe that a < A/^exp {—Nh/Ah) and as rjh 
then a < A/^exp {Kh/ Ah) . Overall, as a, rjh — 0, the reporting error is 
minimized and total reported cases is equal to the total (actual) cases 
(assuming diagnosis is complete). If — )■ oo, then as rjh — ?■ Kh (or 
rjh is high), the error of reporting is very high. If Qh ~ const., then as 
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{ 1,1,1,1,2,2,2,2,2,8,8,12,12,12,5,15,17,17,23,23,23 ) 
Effective reporting (under) number in the presence of MRF 

{ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21 ) 
Reported cases (out of actual cases) 

{ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25 ) 
Actual cases 

Figure 3.1. Schematic diagram of 'multiple reporting 
factor' within under reporting. In the first row, we ob- 
serve that 21 cases are reported for an epidemic in a 
certain time period. If we assume there is no multiple 
reporting among these 21 reported cases, we can con- 
sider them as total reported in this period. If we report 
them as provided in the second row, then the ratio of 
reported cases to the actual disease cases (see third row) 
is 21/25 = 0.84. However, observe that, out of 21 cases 
reported in the first row, case 1 is reported 4 times, case 
2 is reported 5 times, and so on case 23 is reported 3 
times. Removing multiple reported cases from first row, 
the number of distinct cases reported are only 8, thus the 
ratio of reported (after adjusting for under reporting) to 
actual cases reduces to 8/25 = 0.32. 



rjh Kh, error of reporting will be still more than that of expected. 
When A/i ^ then as r]h — > Kh, the error of reporting will decline 
too. But this violates the assumption that reporting error is constant. 
This condition is out of the scope of this section and we discuss these 
issues in the next section. Lower the r]h implies lower level of multiple 
reporting in the population. 

Lemma 5. 77^ — >■ =^ (A^ — Kh) — >■ a. 

Proof. We know that 77/1 — )■ =^ — > X/j. This means by algebraic 
limit principle for a given constant o;, Jl/j -|- a — > Kh -\- a. Therefore 

Ah — )■ Kh + a. This implies, for all e > 0, there exists an integer 
Nh such that h > Nh ^ |A/i — Kh — a\ < e for some h. Therefore 
(A,, - Kh) ^a. □ 

Corollary, a ^ ^ Ah ^ Kh- 

Suppose Tjh > 0, this means there are some empty classes out of Kh 
classes, so that Kh ^ 0,h. This imphes Kh/0,h < 1 and 
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exp {Kh/nn) < 1 



This leads to 



omce, , • • • 

are positive and each are less than 1, we get 



si, ^ < <! ( 1 



- 1 



(see remark [9] in the Appendix I and also Appendix II). Suppose, 
if we relax the assumption on empty classes by allowing r]h ^ 0, then 
e [0, 1]. In this case we can use the Weierstrass inequality of the 
type 



(3.1) l-E 



3=0 



< 1 



j=0 



< 



j=0 



When 7=r- > 0, we have 



j=0 



. j=0 



(3.2) 
(3.3) 



(from the result by [22], and for G [0, 1], we have 



< exp 



j=0 



(3.4) 
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When cases are reported only once. 



1 2 3 



When MRF is present. 



Figure 3.2. This figures indicates if ?7h(G Kh) classes 
are empty (i.e. no reported case in these classes), then 
this is compensated by more than one reported cases in 
one or more of the remaining {K^ — rjh) classes. 



Suppose ^ over j for j = 1, 2, h form a probability distribution, 
then we can arrive at following inequality (for details refer to [?, [21] 



(3.5) 



n 



> 



Theorem 6. If > i^h o,nd MRF is present then A/, G ^^{K^ — rfh). 

Proof. We have seen in section 2.2.1 that a < Kh exp { — {Kh — r]h/K.h)} . 
Therefore, \Ah- {Kh - 'nh) \ < exp {-{Kh - Vh/^h)} ■ When we choose 



Kh > Ah- h r]h 

Ine 

for some /i > N then the result follows. 
Note 7. When lemma [5] is true then Ah G B^{Kh)- 



□ 



3.3. (Ah = i^h)- In this situation, error of reporting is evidently null. 
However, possibility of MRF could not be ruled out. Suppose flh is 



formed of Kh classes as we saw in section 2.2.1 and Vlh 



Kh, then 



L./i Classes as we saw in secLioii z.z.i aiiu = i\h, 
a = 0. U Qh > Kh, then the arguments presented in 2.2.1 holds here 
and similar error exists. 
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3.4. Stratification of error by location and time. Let U and V are s xt 
matrices of reported cases and total cases across s geographical loca- 
tions for t time points. U is represented by, 



. . . ^it 

^ ^21 ^22 ■ ■ ■ ^2t 

_ Qgi Qg2 ■ ■ ■ ^st . 

where, flij is denotes the cases in i*'^ location in the j^^ time point 
(for i = 1,2, ■■■s and j = l,2,...t). Let f2j = and Q = 

Tjf^^Tj^j^^Qij . If aij denote error of reporting in the i*^ location and j*'^ 
time point, then V can be represented by. 



V 



^11 ± an 

fl21 ± «21 



fil2 ± «12 
^22 i Q^22 



■ ■ f^2t ± Oi2t 



^sl ± Oisi Qs2 ± as2 • • • ^st ± OCst . 



- All Ai2 • • • Alt ' 

A21 A22 • • • ^21 

. Asi A52 ■ ■ ■ Ast _ 



If 



aij 7^ and ay = Vj > 1 

a2j 7^ for j = 1, 2and a2j = Vj > 2 

asjj^O for j = 1,2, ...i. 



then the characteristic roots are An, A22, • • • , A^j. In the presence of 
an epidemic, we have, fin 7^ 0, f222 7^ 0, ... , fist 0) hence V can never 
be a singular. In this situation, V is always invertible, such that: 
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Ai, 
A2, 




A 



s. 




^5=1 {^sj ± asj) 



If the error in reporting cases do not follow any pattern, then the 
relationship between U and V follow a random process. There needs 
care in understanding the variability in the error, especially, if the 
pandemic persists in the population for longer duration. 



We saw in the previous section that error of reporting plays im- 
portant role in understanding the epidemic even though it is taken as 
Ah ~ flh over h. Here in this section, it is assumed as a continuous 
random variable with a probability density function (say v'(a)). This 
assumption allows variation in the error of reporting over the time pe- 
riod h. Now the relation between total and reported cases is taken as 
A/i = fi/j ± a, where a = aip{a)da (mean reporting error). 

The error of reporting might increase rapidly or stay steadily or 
might decline after certain time point, since the beginning of an epi- 
demic. Suppose epidemic hits at time to, then error might increase 
or decrease till tk and then change its direction asymptotically (where 
to < tk). The rate of increase or decrease from tototk could be rapidly 
fast or slow. To fit all such situations, we choose Weibull and gamma 
functions and try to explain the error involved through them. These 
two distributions can imitate several functional forms of the nature 
of the error, that we are interested. Historically, Weibull distribution 
has been very popular in the reliability analysis and recently it was 
found to be giving satisfactory results to model incubation period of 
AIDS j25] and survival distribution while analyisng bird flu data [26j. 
There are instances where gamma distribution was also worked as a 
reliable model to explain the incubation period of AIDS. These two 
distributions were able to capture the variability in the incubation pe- 
riod because of their versatile nature. Suppose a ~ Weibull density 
with scale parameter 9 and shape parameter vr, then the mean of the 
error function is (1 + l/vr) and Ah = ± (1 + l/vr) . Unless, 
if the reporting is extremely worst, we need not expect the situation 
Qh < hence we assume Qh > ^F (1 + I/tt) V/i. This assumption is 
also supported by the fact that a ~ Weibull implies a — )■ (a 7^ 0) 



4. Varying epidemic efficiency function 



SHORT TITLE: THEORY ON REPORTING 



14 



as if: -> oo. When a ~ gamma density with scale parameter A and 
shape parameter i/, then the mean of the error function is z//A and 
kh = ^h± vjX. 

When total cases exceed reported cases, MRF discussed in the pre- 
vious section could exist. In such situation, the error estimated above 
using two densities will be an under estimate. Let f] be the factor due to 
MRF which follows a Weibull density with parameters (p, q) and (f{<y') 
be the associated probability density function. If f] is mean number of 
empty classes out of classes, then the mean error in the presence 
of MRF is a' ( say ) = a + 7). Now, the total cases can be estimated 
as Xh = nh+p-i{{l + l/q) , {K/pY} + OT {I + l/vr) (for Weihull) and 
Kh = Vlh + P7 {(1 + l/g) , {K/pY} + v/X (for gamma). See [10] in the 
appendix for the derivation of a' . See also the difference in the mean 
error among 10 pairs of (A/^, VLh) for Kh > situation given in the 
example 1. 

Example 8. A numerical example is given to show the difference be- 
tween mean error [a) and true mean error (a') when multiple reporting 
is present and > ^Ih- 



{Ah, ^h) 


a a 


(Ak, K, - 


-Vh) 


q[ 


(100,95) 


5 


(100,95 - 


-40) 


45 


(90,82) 


8 


(90,82 - 


38) 


46 


(110,100) 


10 


(110, 100 


-40) 


50 


(95,80) 


15 


(95,80 - 


40) 


55 


(102,90) 


12 7.7 


(102,90 - 


-35) 


47 


(90,80) 


10 


(90,80 - 


30) 


40 


(117,110) 


7 


(117,110- 


-20) 


27 


(105,100) 


5 


(105,100- 


-17) 


22 


(197, 194) 


3 


(197, 194- 


-12) 


15 


(208,206) 


2 


(208,206 


-8) 


10 



MRF can be viewed as a multivariate variable and in such situation 
the error estimation will be different than above. The discussion on 
multivariate Weibull can be seen elsewhere [271 [28] . In these works 
authors have demonstrated estimation of parameters when there are 
more than two parameters. 

5. Conclusions 

Mathematical modeling has an important contribution in under- 
standing epidemic outbreak and its spread. Reporting of the infections 
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or disease cases are vital in terms of inputs to these models. However, 
at the same time not being reported or over reporting of the cases leads 
to limitations in assessing the epidemic spread. Usually, mathematical 
models in epidemiology of infectious diseases consists of several param- 
eters, including those determine growth of an epidemic. Growth of an 
epidemic at the initial stage is estimated by conducting trend analysis 
of reported cases. Unless reported cases are adjusted for under report- 
ing (if such exists) and corresponding growth rates are revised before 
plugging into models, often models need not predict accurately the 
spread of infection. The difficulty lies in understanding the degree of 
under reporting when a trend analysis on reported cases is conducted. 
Some times reporting may be accurate in few reporting centers but 
these centers might not be representative to the entire population for 
which we are interested to predict the future course of an epidemic by 
using mathematical models. Further, the presence of multiple reporting 
within under reporting of disease cases could complicate the assessment 
of degree of under reporting and hence calculation of growth param- 
eters required for modeling the spread is not straightforward. In an 
recent outbreaks of SARS there was some concern for under reporting 
[50] and over-reporting [31|, however it was concluded later that 
there was no evidence of over- reporting of SARS |32]. We conclude 
there needs systematic adjustment for under reporting and multiple 
reporting within under reporting before analyzing the hospital based 
data, if such issues exists in the data. In this note, total disease cases 
occurred in a given population was taken as reported plus or minus 
error of reporting. We have theoretically analyzed the degree of re- 
porting error involved in under, over and multiple reporting of disease 

cases. We saw that errors have upper bounds Q In (^Q/A^ when T < 

and Aexp (^—Q/A^ when > flh- Multiple reporting factor (MRF) 

influences the error estimates when disease cases are reported more 
than once and over all there exists under reporting in an outbreak. We 
have explained schematically as well as numerically the impact of this 
multiple reporting through a factor 77. When reported cases suffer from 
under reporting, the upper bound for error is larger. In the presence 
of MRF and Ah > flh, these bounds increase further. 

When the error is assumed to be a continuous random variable which 
follows two probability density functions viz, Weihull, gamma then the 
relation between total and reported cases are given in terms of their re- 
spective means obtained from these densities. Also, for the continuous 
case the impact of MRF is studied and error is derived using probability 
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density functions. The error function expressed in terms of incomplete 
gamma function can be numerically explored. Such functions can also 
be applied for computation of bounds of life expectancy in human pop- 
ulations |33| When reported cases are completely made up of Kh 
classes out of which 7]^ classes are empty (i.e. with no reporting in these 
classes) then we showed that additional error p'~f {{I + 1/q) , {K/pY} 
would be an algebraic addition to the error without MRF. It was also 
shown that as 7]^ — )■ 0, then — — )■ a. Recall, that is a positive 
integer defined as number of classes at time h which can accommodate 

In case of emerging or newly identified pandemics, reporting error 
could follow a random pattern. Sometimes, the reporting across coun- 
tries also vary in case of new epidemics due to lack of proper guidelines 
and protocols of diagnosis. The matrix analysis presented can be ex- 
tended to global epidemic, where status of error in each country is de- 
pended on the country specific guidelines. The results presented in this 
work helps in framing protocols for analysis and reporting the epidemic 
data. The results can be useful in careful handling of various factors of 
potential errors due to multiple reporting independently and multiple 
reporting within under reporting. This kind of analysis presented here 
applied to the epidemic is new and probably is in initial stage. We are 
able to address the issues related to importance of adjusting multiple 
reporting error by this method. The ideas presented could lead to new 
theoretical approaches and also could be a supplement to the existing 
methods in epidemic analysis. 



Appendix I 

Remark 9. Suppose < -^^ < 1 Vj = 1, 2, ■ ■ ■ , /i. Then 



K2 



fix ^3 / 



and so on up to h thteim. 
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Therefore, we get 




This kind of inequahty is also called Weierstrass's type inequality. 
Original inequality is given in Appendix II. 

Remark 10. Let a and 1] be two continuous random variables with 
< a < oo and < rj < K, where K is the maximum number 
of empty classes that r] can attain. We know that E{a') = E{a) + 
E{rj), where Eis expectation or mean of the random variable. This means, 
a' = a +f]. Let a ~ Weihull {9, n) and r] ~ Weihull (p, q) then 



a' 



Taking = ^5 j = u and changing the limits accordingly, 

we get as below 

,pA_,lV 9 /^^/^^^ 1 up fp\ 
= c^l lH — H — / pui — -J- exp {—u) \ - ] ui du 

V T^J pJo pu}''' KqJ 

= 6'r(lH — ] + P UI exp (— m) du 

V Try Jq 

^ = ^r(^i + i^+p7{(i + i/9),(^ 

A'"ote 11. Other possible assumptions like rj ~ gamma{p, q) and deriva- 
tion of corresponding mean error is left as an exercise. 



Appendix II: Results due to Copson [20], Klamkin and 
Newman [22], Klamkin [23], El-Neweihi and Proschan 



E.T. Copson proved that a bounded sequence of real numbers (a„) 
is convergent if the inequality a„+2 < \ (ctn+i + ctn) is satisfied. 

He further proves a more general theorem, whose statement is as 
follows: 



Theorem. // (a„) is a hounded sequence which satisfies the inequality 



ttn+r = Ylir=i ^s'^n+r-s, whcTc the cocfficicnts Kg are strictly positive 
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and K1+K2 +... + Kr= 1, then (a„) is a convergent sequence. But if 
{an) is unbounded, it diverges to —00. 

Weierstrass inequalities [33] (also available in are given by 



(5.2) l-Si<l[il-A,)<il + Si)-' 

i=l 
n 

(5.3) l + S,<l[{l + A,)<{l-Si)-' 

i=l 

where Ai, A2, ■■■,An are real numbers in [0,1] and Si = Yli=i^i- 
Si < 1 in the inequality fl5.3p . M. S. Klamkin and D. J. Newman [22j 
have extended the Weierstrass inequalities and showed that 



(5.4) l[{l + A,)>{n + irl[A, 

i=l i=l 

n n 

(5.5) \[{l-Ai)>{n-lf\[A, 

i=l i=l 

where A > 0, z = 1,2, ...n and ELi ^» = 1- M. S. Klamkin [22 
further proved, under the same conditions, that 



with equality only if = 1/n. E. El-Neweihi and F. Proschan [24J 
had established Weierstrass-product type inequalities (15. 4[ 15. 5[ 15. 6p by 
a uniform approach and then using powerful tools of majorization and 
Schur-convex and Schur-concave functions. 
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Parameter 


Definition 




number of total disease cases at time h 




number of reported disease cases at time h 




"^l^^i Afc, where Ajfc is number of total disease cases at time h 




YlT-i ^fc' where l^^is number of reported disease cases at time h 


nih 


number of individuals out of flh who are reported exactly once 




difference between D.^ and 




difference between A/jand riih , where riih < 


Kh 


number of classes where fi/, cases could be located 


Vh 


number of empty classes out of 



Table 1. Parameters and definitions 



